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Abstract 

We test the performance of five different interface preconditionings for domain-decomposed 
convection-diffusion problems, including a novel one known as the spectral probe, while varying 
mesh parameter, Reynolds number, ratio of subdomain diffusion coefficients, and domain aspect 
ratio. The preconditioners are representative of the range of practically computable possibilities 
that have appeared in the domain decomposition literature for the treatment of nonoverlapping 
subdomains. We demonstrate through a large number of numerical examples that no single precon- 
ditioner can be considered uniformly superior or uniformly inferior to the rest, but that knowledge 
of particulars, including the shape and strength of the convection, is important in selecting among 
them in a given problem. 


1* Introduction 

The solution of linearized convection-diffusion equations of the form 

c • V0 - V • cS/(j) = /, 


(i.i) 


where <f> is a conserved quantity (energy, mass fraction, momentum component, etc.) transported 
under the influence of velocity field c and diffusivity c is required throughout computational physics. 
Discretization by finite differences or finite elements results in a large sparse system of algebraic 
equations whose solution can be demanding in computational resources and is one of the many 
driving forces for parallel computation. Because the strength of coupling between a pair of dis- 
crete unknowns governed by an equation like (1.1) decays with physical separation (more or less 
isotropically depending upon c), it is natural to partition the problem spatially when looking for 


concurrency in the solution algorithm. Parallelism is, in fact, only one of several compelling reasons 
for the recent surge of research on domain decomposition algorithms exemplified by the volumes [9, 
10, 17]. Others include a powerful theory describing optimal or near-optimal algebraic convergence 
rates for hierarchical preconditioners of domain-decomposed type, the convenience of composite ar- 
ray data structures for describing complex shapes, a desire to employ solution techniques and quality 
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software restricted to problems with various local uniformity requirements (which are subproblems 
with regard to (1.1)), and sheer problem size, which can ultimately push numerical ill-conditioning 
and serial memory traffic beyond acceptable limits. 

Preconditionings for interfacial degrees of freedom have been the focus of much attention dur- 
ing the development of domain decomposition methods, and deservedly so, since interfaces of lower 
dimension than the original domain of definition of the partial differential equation are created by 
a predominant form of nonoverlapping decomposition related to nested dissection of the underlying 
finite difference or finite element matrix operator. We refer generically to such forms of domain 
decomposition as Schur iteration, since elimination of the subdomain interiors leaves a Schur com- 
plement system for the separator unknowns. Additional interest in interface preconditioning comes 
from the fact that the classical Schwarz iteration, the prototype for overlapping decompositions, 
has been placed into correspondence with a stationary iteration having as unknowns the interfa- 
cial degrees of freedom of a nonoverlapping decomposition [6, 11]. This correspondence between 
Schwarz and Schur methods enriches the study of domain decomposition algorithms in general, 
because properties which are more easily analyzed in one framework may be extended to the other. 

The present contribution focuses on the performance of a variety of easily computed Schur 
complement preconditioners in a rather special context: a single interface dividing a rectangle into 
two subrectangles in which the capability of performing exact solves is presumed. We consider a 
scalar convection -diffusion operator under a uniform or “terraced” diffusion coefficient and a variety 
of continuity-satisfying flow fields chosen to exhibit the relative advantages and disadvantages of 
the preconditioners. The pristine nature of the problem class allows focusing on the quality of the 
interfacial preconditioning, alone, in four different limits: large discrete problem size, large Reynolds 
(or Peclet) number, large diffusion coefficient ratio, and large aspect ratio. (The Reynolds number 
is the dimensionless ratio c//c, where c is a characteristic velocity, l a characteristic length, and l a 
characteristic diffusivity. Large values characterize strongly nonsymmetric, convectively-dominated 
systems.) Any or all of these limits could be important in a production engineering code whose 
parallelization might be sought through domain decomposition. We show that no single interface 
preconditioner is best in all limits, and therefore seek to qualitatively rank their sensitivities to 
these limits and identify realms of superiority. 

Several different coefficient fields c and c are studied because the performance of all of the 
preconditioners are sensitive to them and unjustified optimism or pessimism can result from too 
narrow a study. Two of our five preconditioners have been amply studied previously in the sym- 
metric positive definite context of pure diffusion. There have been very few studies of any of them 
in the convection-diffusion context, and since this case is also relatively untouched by theoretical 
approaches, apart from spatially invariant velocity distributions, numerical studies are continuing 
to yield interesting information. 

We comment briefly on a few other issues which bear on our choice of scope. It is possible to 
set up an alternative framework for nonoverlap ping decompositions in which interfacial coupling 
is simply discarded, or partially accounted for in ways that do not require special treatment of a 
separator set; see, e.g [l] and [26]. In so doing one obtains the advantages of greatly simplified 
coding and less inter-domain data traffic per iteration. Problems dominated by local interactions 
can be handled quite acceptably by decoupling; see c.g [23]. However, in problems which are 
diffusively dominated (more fundamentally, problems whose Green’s functions have support which 
is not substantially confined within artificial subdomain boundaries), such approaches haye limited 
applicability to large numbers of gridpoints and/or subdomains. 

The special case of a single interface obviates discussion of preconditioning the set of vertices 
where multiple interfaces intersect. Vertex preconditioning is very important but also more readily 
prescribable, at least in two dimensions. A coarse grid problem for the vertices having the same 
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structure as the undecom posed original problem can be derived directly from the differential op- 
erator by employing a hierarchical basis discretization. The interface system, on the other hand, 
corresponds to a pseudo- differential operator, the numerical analysis of which is relatively less well 
developed in the presence of convective terms. In a preconditioner consisting of component blocks 
corresponding to subdomains, vertices, and interfacial edges (and interfacial planes in three dimen- 
sions), any one block can limit the overall performance. A study of interface preconditioning is 
thus necessary, but not sufficient, for guiding the construction of complete preconditioners. 

Finally, as to the relevance of our scope, we note that practical problems often involve several 
simultaneous convection-diffusion operators linked through coefficients or source terms. Continued 
study of the scalar case is, however well motivated by techniques such as the alternating block 
factorization [4] which successfully employ scalar preconditioners inside of a change of dependent 
variables which partially decouples the original system. 

The algorithmic framework of our experiments is described in Section 2, followed by intro- 
duction of the five interface preconditioners and a brief discussion of their properties in Section 3. 
Section 4 contains performance measurements in the form of iteration counts along several axes of 
problem parameter space. We draw some conclusions and recommendations in Section 5. 


2. Schur Domain Decomposition Methods 

We take as our starting point the matrix equation Ax = b arising from a finite difference dis- 
cretization of of (1.1). The domain decomposition method we employ is an iterative substructuring 
method consisting of three elements: (1) the operator A whose inverse action we would like to 
compute with an accuracy commensurate with the discretization, (2) an approximation B to A, 
whose inverse action is computationally convenient to compute, and (3) an acceleration scheme for 
the preconditioned system which requires only the ability to form the actions of A and B~ l on a 
vector. In all cases reported herein, A is derived from a second-order central differencing of the 
diffusion term and a first-order upwind differencing of the convection term. Extensions to second- 
order upwind differencing have been carried out in, for instance, [27]. We use right-preconditioned 
GMRES [30] as our iterative acceleration scheme, that is, wc solve AB~ l y = 6 by the applying the 
standard GMRES algorithm to (AB~ l ) then recover x through the solution of Bx = y . 

GMRES is guaranteed to converge in a finite number of steps for nonsingular AB~ l even in 
the presence of nonsymmetry or indefiniteness, assuming exact arithmetic. The maximum number 
of steps required is the number of distinct eigenvalues of the preconditioned operator. This con- 
vergence result depends upon dynamically storing a complete basis for the Krylov space built from 
powers of acting on the initial residual vector. For large problems, this much memory can 

easily become excessive, and GMRES is often truncated or restarted [30] in cases where it does 
not converge within a predetermined number of steps. However, wc allow full GMRES iteration 
in our experiments, up to some maximum number of steps (set at 30 herein) which is sufficient in 
all but two cases. Because fewer than 30 steps are almost always sufficient, we effectively suppress 
from consideration the restart or truncation parameter. This parameter can be important in a 
“production” setting. 

The substructuring enters through the manipulation of A and B into forms which possess 
large block zeros, for the sake of concurrency or for some of the other reasons noted in the intro- 
duction. For elliptic operators such as (1.1), A is irreducible; hence there are no block triangular 
permutations. However, if the domain is cut by the removal of a swath of gridpoints as wide as the 
semi- band width of the stencil, two large subproblems arc created whose only coupling is through 
the small removed set. For five-point stencils on logically tensor product grids, we may choose 
a single row or column of unknowns. (A two-point- wide generalization has been studied for the 
thirteen-point biharmonic stencil in [8].) Ordering the separators last, we obtain 
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( 2 . 1 ) 


//111 0 1 3 \ / * l\ / &l\ 

Ax = 0 An An I x 2 l = I b 2 1 = b. 

\A 3 1 ^32 ^ 33 / \^ 3 / \* 3 / 

Here, /In and /I22 are five-point operators with bandwidth no larger than that of the naturally 
ordered original system, but A33, which renders the coupling between the points on the interface 
itself, is tridiagonal. The other blocks contain the coupling of the separator unknowns to the 
subdomains, and vice versa. From the point of view of the continuous operator they represent 
derivatives in directions normal to the interface. 

Block Gaussian elimination of the unknowns xi and x 2 would yield the Schur complement 


system 

Cx 3 = d 

(2.2) 

for X3, where 

C = A33 - A31 Aj", 1 A i 3 - A32A22 1 A23 

( 2 . 3 ) 

and 

d = 63 — A31 A“% — A 32 A. 22* 5 / ■ 

( 2 . 4 ) 


If X3 can be found, the subdomain problems are decoupled. However, direct computation of the 
generally dense C in order to solve (2.2) requires as many pairs of exact subdomain solves as there 
are degrees of freedom in x 3 , which is generally prohibitive. It is also unnecessary inasmuch as 
iterative techniques have been devised which require many fewer iterations than the dimension 
of x 3 , and which furthermore require only approximate subdomain solves in each iteration. As 
mentioned already, we shall ignore the option of inexact subdomain solves in the sequel, effectively 
reducing the iterations to the interface, but we nevertheless make use of a general purpose code 
which retains the interior degrees of freedom in carrying out the numerical experiments. 

We consider two families of preconditioners D, the structurally symmetric 

(An 

n 1=0 

V'tai 

/ An 0 

= 0 A Vi 

\ ^31 A32 

where M approximates the Schur complement C ( 2 . 3 ) of An and A22 in A, and the simpler block 
triangular 

(A\\ 0 Aj 3 \ 

#2 = ( 0 A 22 An I . 

\ 0 0 M ) 

The factorized form of /I \ above shows that the cost of applying the inverse of B\ is one solve 
with M and two solves each with An and A22* There is an inherent sequentiality to the subdomain 
solves, however, since the system involving M in the left factor requires data from the first set of 
subdomain solves. The inverse of B 2 can be applied to a vector at the cost of solving one system 
each with M, An, and A l2 . The system for M is solved first, followed by independent solves in 
the subdomains which use the interface values as boundary conditions. 

We assume throughout that the An are invertible. (This is certainly a reasonable requirement 
for a discrete convective-diffusive operator and is guaranteed heiein for all Reynolds numbers by 
upwind differencing.) Under this assumption, C is also invertible [ 15 ]. , 


0 0 \ // 0 A^AjaX 

A22 0 0/ A.-M23 

A32 MJ \0 0 I J 

A 13 
A23 

M + A 3 1 A^, 1 A 1 3 + A32 A^ A23 
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For matrices arising from standard quasi-uniform finite element discretizations of elliptic partial 
differential equations, A lias a condition number of 0(h~ 2 ), whereas C has a condition number of 
0(h~ l ) [5, 29], The equivalence of conjugate gradient iterations on the Schiir complement system 
with preconditioner M and on the full substructured matrix A with preconditioner B\ was shown 
in [24]. 

For reference in Section 4, it is interesting to note the forms of the preconditioned ’operators 
AB± l , and AB% 1 • In order to make the formulae more readable, we combine the independent 
subdomain solves into a block matrix /lq, and denote the separator block by Ar, to re-express the 
above matrices as 


whence 

and 


, _ ( do 'lor \ n - ( do dnr \ n _ {An i4nr\ 

\drn dp / 1 \dpn A/ + /trad^'/lnr/ 2 \ 0 M ) ' 

dn l + d^ 1 -A^ 1 AnvM~ l \ 

-M-'sU^Au 1 M - 1 ) 


+ A] 


B- i-( A u -An'A ar M-'\ 

" a ~ { 0 M- 1 ) ■ 

am these expressions it can easily be verified that 

AB7 l = (.. „ *1 . l and AB~ l — ( 


(2.5) 


It is evident that if C is exactly represented by M , then AB f 1 reduces to the identity, and 
an iteration involving AB^ 1 will converge in one step requiring two sets of subdomain solves. 
An iteration involving A 77.J 1 , on the otlier hand, will converge in two steps (from an arbitrary 
initial guess) if M = C, but each step requires only one set of subdomain solves. (These iteration 
counts do not include the final solve with either B i or Bj which is required to unwind the right- 
preconditioning.) More generally, if M is sufficiently close to C in the sense that the lower-left block 
of the structurally symmetric system is small, ||(7 - CM~ l )\\ <C 1, we expect that an iteration 
based on B 2 will require an extra iteration relative to an iteration based on B\, Conversely, if M is 
a poor preconditioner for C, so that the lower left block becomes large, the use of the structurally 
symmetric system could require more iterations than the use of the block triangular system. Both 
behaviors arc illustrated in Section 4. 

Note from (2.5) that /l/Jj" 1 and A7J.J 1 have identical spectra, as Arnoldi estimates for the 
eigenvalues obtained as a by-product of the GMIl.ES iterations also show. However, Krylov se- 
quences based on the respective operators will in general differ, and there is little that can be said 
about which method will lead to faster convergence for general c if M and C are not sufficiently 
close. 

For some of the preconditioner components M we consider, the overall preconditioning process 
is numerical unstable, as will be seen in Section 4. Even though the iterations involving AB~ l may 
converge, the final result after unwinding the preconditioning may have few or even no significant 
figures. For this reason, we always check the actual residual ||/ — Ax\\ at the end of each calculation. 


3. Scliur Interface Preconditioners * 

We proceed to delineate five alternatives for the matrix M. 

4 

3.1. Interface Probe Preconditioner 

Interface probe preconditioning is a family of methods for approximating the true Schur com- 
plement C defined in (2.3) by low bandwidth matrices. We use the nomenclature IP(Ar) to denote 
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the approximation sequence M — Ay - Ek, k — 0,1, 2, . . where Ek is a matrix of semi-bandwidth 
k whicli produces the same action as on a set of 2k A 1 test vectors. Note that when A 

arises from a five-point finite difference discretization botli the IP(0) and IP(1) preconditiohers are 
tridiagonal because Aar is. As k increases beyond 1, M acquires additional diagonals. Selection 
of test vectors of appropriate sparsity patterns enables the coefficients of E k to be read directly ofT 
of the product involving hence the term “probe.” We report only on the row-sum 

conserving IP(0) herein. IP(1) is only rarely more cost effective than IP(0) over the range of non- 
symmetric scalar five-point stencil problems studied herein, and a law of diminishing returns sets 
in as k is increased, 

IP(0) was invented independently by Chan and Eisenstat in 1985, immediately generalized to 
IP(Jfc) in [14], and adopted for variable coefficient symmetric problems in [24] (where it was called 
the “modified Schur complement” method) and for nonsymmetric problems in [25, 26]. Symmetric 
versions of IP(0) and 1 P ( 1 ) have also been employed in [2, 3], Many algebraic and spectral properties 
of banded and circulant probe preconditioners are derived in [13]. The interface probe technique 
has the advantage of being purely algebraic in character, and hence capable of being defined for 
arbitrary operators. It is aesthetically pleasing that the tunable parameter k may be taken from 
the crude approximation of 0 all the way to the full bandwidth exact solution. It has also been 
generalized in a straightforward way to multicomponent systems [26]. However, IP (k) for low fc 
is not expected to be particularly useful for arbitrary matrices. The low k limit is motivated by 
the observation that the elements of C decay rapidly away from the diagonal for elliptic problems. 
In sufficiently simple elliptic problems (e g., those possessing constant coefficients) preconditioners 
described below taking better advantage of this structure are also possible, leaving IP (k) large 
but not unlimited regions of problem parameter space in which to exercise. Interface probing 
has the advantage of being automatically adaptive to spatial variation in the coefficients but the 
disadvantage of not possessing the property of spectral equivalence, a consequence of which is that 
it degrades as the mesh is refined. Experimentally [24], the condition number of the preconditioned 
Schur complement system for the Laplacian goes like /i” 1 / 2 , and this bound is conjectured to be 
the best attainable for any tridiagonal matrix based on experiments with an optimization code in 
[20]. An h~ x I 2 bound is proved for a circulant probe-prcconditioned system with periodic boundary 
conditions on the boundaries normal to the interface in [13], 

3,2, Spectral Preconditioner 

The spectral preconditioner is an exact eigendccomposition of a single interface, rectangular 
domain, constant coefficient convection-diffusion operator described in [12] as a generalization of [7], 
We consider only the Dirichlct case herein, but generalizations to Neumann boundary conditions are 
straightforward. Let an interface of n interior nodes (z.c., h~ l = n + 1) separate two subdomains of 
the same discrete length, and discrete heights m\ and m 2 > respectively, over all of which is satisfied 
the difference equation 


j bx^j T cx’i'-j-ij T -J- (3*1) 

where i denotes the free index along the interface. We may write M ~ DW AW~ l D~ l , where W 
is the discrete sine transform of length n with matrix representation 

i 

[W]ij = y/Wis'mijwh, (3-2) 

* 

D is the diagonal matrix with dements 

[D]i= 0 (,-1)/ \ (3-3) 
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Figure 1: Modes of the Dirichlet problem (3.1) for n = 15. 
(a) |a/c| = 1, j = 1; (b) |a/c| = 1.21, j = 1; (c) |a/c| = 1, 
3 = 8; (d) |a/c| = 1.21, j = 8; (c) |a/c| = 1, j = 15; (f) 
|a/c| = 1.21, j — 15. (The left-hand column of modes are for 
the case of no tangential convection.) 


and A is a diagonal matrix with elements 


[A]i = 


1 

2 


A + 7, mi+1 

Vl-7 t m ‘ +1 


+ 


i + 7 r +1 

1 - 7, m2+1 



- ( J ,)] 2 - Ade, 


(3.4) 


where, in turn, 



The derivation of these formulae (see [12] for full details) begins with the observation that the 
columns of the matrix (DW) are the eigenvectors of the tridiagonal matrix formed by the coefficients 
along the interface, viz., tridiag(a,6,c). Sample such modes are plotted in Figure 1 for two different 
values of the ratio |a/c| corresponding to zero and constant nonzero tangential components of the 
convection. The nonvanishing first-derivative convection term has the effect of multiplying the 
sinusoids by an exponential. 

The philosophy of using the spectral preconditioner for arbitrary interfacial systems is that 
of solving an approximate (constant coefficient) problem exactly, rather than an exact (general 
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coefficient) problem approximately. One of its advantages is that it can be defined without requiring 
the ability to solve problems in adjacent subdomains, as required by the interface probe technique. 
All that is needed is some averaging rule to obtain the coefficients a through e from the data of 
the associated regions. All our tests herein employ a simple average of the coefficients along the 
interface alone. Another advantage is its automatic adaptivity to domain aspect ratio, since the 
boundary conditions are built into the derivation. We note that application of M~ l is inexpensive: 
two one-dimensional FFTs sandwiched between three diagonal matrix multiplications. 

3.3. Spectral Probe Preconditioner 

The spectral probe preconditioncr, introduced here for the first time, is conceptually a hybrid 
of the interface probe and the spectral preconditioners. Spectral probing assumes a form for the 
eigenvectors of C like that derived for the constant coefficient operator of the previous subsection 
(again based on spatially averaged coefficients), but then populates the diagonal matrix A by 
probing the true Schur complement, so that some spatial adaptivity is accommodated within a 
spectrally equivalent framework. 

We set M ~ DW AW“ l D~ l where W and 1) are defined as above (or where D is alternatively 
simply set to the identity matrix, corresponding to a = c, for reasons which will become clear in 
Section 4). A is then determined by probing with the interface vector of all l’s. This is the same as 
the standard test vector for IP(0). To be explicit, we read off the elements of A from the equation 

W~ l D- l CDW* 1 - A*l. 

The action of C is computed by means of a pair of subdomain solves using DW * 1 as the interface 
boundary condition. Note that the spectral probe proconditioner reduces to the spectral precon- 
ditioner in the constant coefficient Dirichlet case, because then C is exactly diagonalized by the 
given similarity transform, 

3.4. Laplacian Square-root Preconditioner 

As a base-line reference, and because it appears throughout the literature, we include tests 
with a method based on the square-root of the one-dimensional Laplacian operator, easily written 
as: 

M = WAW~\ 

where A is now the diagonal matrix with elements [A],- = 2^/oL We sometimes denote this oper- 
ator as Dryja’s preconditioner because of its popularization in this context in [16]. More general 
discrete antecedants were considered in [18]. It is difficult to pinpoint the discovery of the spectral 
equivalence of this operator to the Schur complement of the Laplacian, since the continuous analog 
of this equivalence has been known for some time. This preconditioner is expected to be good in 
diffusion-dominated problems, or in the discrete limit h — * 0. 

Note that this preconditioner is distinct from the spectral preconditioner (§3.2) for the Lapla- 
cian. Dryja’s preconditioner achieves a constant bound on the number of iterations as the mesh is 
refined, but the constant is generally higher than that achievable with the coefficient and aspect 
ratio adaptability of the previous two techniques. The literature also records two important precon- 
ditioners intermediate between the Dryja and spectral techniques, namely [19] and [5]. The latter, 
the Neumann-Diriclilct preconditioncr, contains some of the adaptive capabilities of the spectral 
preconditioner since it relies on subdomain solves in its construction and hence contains much coeffi- 
cient information. It is similar to probing techniques in this regard. In fact, the Neumann-Dirichlet 
preconditioner is exact in problems possessing symmetry across the separator set. All four of the 
techniques of [5, 7, 16, 19] were tested in [24], but for brevity we test only the extremes here. 

3.5. Tangential Preconditioner 

Finally, we consider a simple preconditioncr possessing partial adaptivity, a lower-dimensional 
restriction of the operator to the interface created by setting all of the normal derivative terms in 
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the operator to zero and retaining just the remainder in M. For (1.1) these are just the tangential 
derivative terms. The obvious motivation for this technique is that it is simple and is expected 
to work well in the limit of strong convection along the interface, a limit which turns out to be 
troublesome for the spectral and spectral probe preconditioners. In addition, its very satisfactory 
behavior in the multidomain experiments in [21] recommend it. For reasons not yet theoretically 
explained, it performs very well in conjunction with the block triangular form of the the overall 
preconditioner described in Section 2. A minor disadvantage is its requirement of partial knowledge 
of the differential operator, rather than simply the elements of the discrete operator A* To be 
specific, it is necessary to store separately the contributions to A arising from the normal derivative 
terms, and all other terms. , 

4. Numerical Experiments 

All of the experiments to follow except for those of Table 14 are posed on the unit square (/ = 1 
in the definition of the Reynolds number, Re) with homogeneous Dirichlet boundary conditions. 
The five different continuity-satisfying flow fields tested are shown in Figure 2(b)-(f) , along with 
a purely diffusive baseline case (Figure 2(a)). When Reynolds numbers are reported below for the 
variable coefficient cases, they are always based on the maximum velocity in the region. (See [28] 
for details on the jet and cell flows and other experiments on this particular problem set.) The 
interface divides the rectangle into equal upper and lower portions, as marked on the figure in the 
dashed line. In addition to cases with constant diffusion, we study in §4.3 a convectionless case 
with piecewise constant, but disparate, dilTusi vitics on either side of the interface. 

There is a constant source term of unit strength in the interior. Although it is special, a zero 
initial guess for the solution vector is employed throughout, since this will usually be the natural 
choice when (1.1) arises for a Newton increment, as part of an outer nonlinear iteration. The 
performance of the preconditioners is measured by the number of iterations required to reduce the 
initial residual by a factor of 10~ 5 , regardless of the mesh resolution. The tables of iteration counts 
arc grouped by subsections into four sets of experiments. 

4,1. Sensitivity to Mesh Refinement 

Tables 1 through G examine a constant Re situation as the (uniform) mesh is refined by three 
successive powers of 2. Of course, the discrete diffusion term, the Laplacian, becomes more and 
more dominant with each refinement of the grid, since it scales as h ~ 2 as compared with the h~ l 
scaling of the convection term. This is the asymptotic limit for which D and its relatives S and 
SP are designed. In the first table, the Laplacian is studied in isolation (Re = 0). In the next 
five convective cases, Re = 1G. For the coarsest mesh (h~ l = 8), the contributions to the diagonal 
of the discrete operator from the two terms are equal at this Reynolds number (the cell Reynolds 
number, Re c s ch/e , is 2). 
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Table 1: Iteration counts for the pure dilTusion problem as a 
function of mesh parameter for two different preconditioner 
structures and five different interface blocks. 
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Figure 2: Streamfunction contour plots of the two-dimen- 
sional flow fields represented by c in the numerical experi- 
ments, (a) Pure Diffusion; (b) Normal Convection; (c) Tan- 
gential Convection; (cl) Skew Convection; (e) Jet Convection 
(the domain is the right half of a symmetric flow field); (f) 

Cell Convection. 

The S (spectral), SP (spectral probe), and I) (Dryja) columns of Table 1 reveal their exactness 
or spectral equivalence, respectively. Because iteration count is a threshold measurement, most of 
the data is subject to ±1 perturbation upon modest adjustment of the convergence tolerances, but 
the S and SP residuals at convergence are zero to machine precision. The deterioration of IP like 
some negative power of h is evident on both the structurally symmetric (Bi) and block triangular 
(B 2 ) sides of the table. The tangential preconditioner is the only one with markedly different 
performance depending upon the structure of B. Here, as below, it is excellent in conjunction with 
the block triangular form. 
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Table 2: Same as Table 1, except for normal convection at 
a Reynolds number of 16. 


Table 2, for a normal convection problem, is similar to Table 1 except that IP improves slightly 
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as each of the terms ytai/lj - , 1 /Iks and ^23 being approximated by a diagonal matrix becomes 

less important relative to A :Ki because one of the coupling matrices is small. For instance, if the 
convection is from subdomain 1 into subdomain 2, and A 32 are weak. 
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Table 3: Same as Table 1, except for tangential convection 
at a Reynolds number of 16. 


The importance of the D matrix in the spectral preconditioner is evident in Table 3 in which a 
tangential convection problem is considered. The version of SP employed in this study approximates 
the D in its definition as the identity; using the true D here would reproduce the spectral results 
in this constant coefficient case, just as in the previous two tables in which D = I anyway. Though 
SP and D are spectrally equivalent, they require an order of magnitude more iterations than S, and 
are surpassed by IP in the smaller problem range on the structurally symmetric side, and by the 
tangential preconditioner on the block triangular side. 
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Table 4: Same as Table 1, except for skew convection at a 
Reynolds number of 16. 


Table 4 contains the last of the constant coefficient test examples, featuring a skew convection 
(inclined at 45 degrees relative to the interface). Overall results are not very different from the 
purely tangential convection case. Most of the entries in the skew table lie at or between cor- 
responding entries in the normal and tangential tables. Those of the tangential preconditioner, 
however, are often worse in the intermediate skew convection case than at either extreme. 
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Table 5s Same as Table 1, except for jot convection at a 
Reynolds number of 16. 
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The jot case recorded in 'Pablo 5 tends to level the preconditioner landscape because the 
constant coefficient approximation of S is no longer exact, S remains the best technique as h~ l 
increases, but its margin of superiority over other spectrally equivalent techniques (with D as a 
nonadaptive extreme) is small. 
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Table 6: Same as Table 1, except for cell convection at a 
Reynolds number of 1G. 


The cell case of Table 6 is the greatest equalizer among the test cases, because the interface 
cuts a zone of recirculation, f.e., there is normal flow across it in both directions, and none of the 
methods holds an edge of superiority. Performance under the block tridiagonal preconditioner is 
notably uniform for the hist four methods. 

4*2. Sensitivity to Reynolds Number 

Tables 7 through 11 examine the influence of increasing Reynolds number. Values of Re of 
0, 4, 16, G4, 256, and 1024 are considered at h~ x = 64. Thus, the third row of each table in this 
subsection is the same as the hist row of the table of corresponding flow type in the first set, ‘and the 
first row of each table is the same as the hist row of the pure diffusion case in Table 1. Progressing 
down the rows of the table the nonsymmetry of the operator increases. Between rows* four and 
five the convection terms begin to contribute more heavily than the diffusion terms to the diagonal 
elements of A . 
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Table 7: Iteration counts for the normal convection prob- 
lem at a constant mesh parameter of 1/64 as a function of 
Reynolds number for two different preconditioner structures 
and five different interface blocks. 


Table 7 shows that in the presence of constant normal convection, techniques S and SP remain 
exact at any Re, while IP catches up at high Re, and D and T successively worsen. (For D, this 
is the drawback of finite h in a method derived for h — ► 0.) Qualitatively, the trends are the same 
for either preconditioner structure, although the tangential method continues to be much better in 
the block triangular formulation. 

The Achilles’ heel of the spectral technique appears when there is strong convection tangential 
to the interface, as seen in Table 8. In this limit in which \a/c\ differs sufficiently from unity the 
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Table 8: Same as Table 7, except for tangential convection. 
The hyphen denotes loss of precision, and the “>” more than 
30 iterations. 


latter terms in D , which have this ratio raised to as much as (n — l) st power, can approach the 
machine epsilon (or its reciprocal, depending upon the direction of the convection). In reference to 
(3.3), we note that for n — 64, (a/c)^" 1 )/ 2 « f,nach ~ 10” 16 when ( ajc ) « 10(~ 32 / 63 ) « 0.31. Under 
upwind differencing, it only takes a cell Reynolds number of 2 to produce a ratio of 3 between the 
upwind and downwind stencil coefficients a and c. Thus, Re c = 2 is the borderline of stability 
for the spectral preconditioner with respect to tangential convection. In the tables, the Re = 64 
row corresponds to a cell Reynolds number of unity, and the Re = 256 row to a cell Reynolds 
number of 4; thus, they straddle the transition. GMRES iterations based on the spectral interface 
preconditioner do terminate for the hyphenated entries, but the residuals based on the resulting x 
vectors shows complete loss of precision. 

The spectral probe technique does not lose precision, because of the assumption that D = 7; 
however, a diagonal approximation for W~ l CW is poor, and it simply takes too long to converge. 
The Dryja preconditioner, which makes no attempt to adapt to the strong directionality of the 
problem also deteriorates with Re. At low Re where M and C are close, the B\ iteration count 
is lower by one; but at high Re, where isotropic M is very different from unidirectional C, the Bi 
iteration count is better. The interface probe technique meanwhile improves with Re as it captures 
more of the increasingly diagonally dominant problem within its own sparsity structure. Finally, the 
tangential technique is excellent for a tangentially dominated operator. The cross-diffusion which 
it neglects becomes of negligible importance as the problem effectively decouples into independent 
problems for An, ^22* and A33 in which the upstream boundary condition is all important. 
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Table 9: Same as Table 7, except for skew convection. 


Most of the observations of the high Re tangential flow also apply to the skew flow in Table 9, 
however, the latter differentiates between IP, which repeats its tendency to improve as Re grows, 
and T, which no longer matches the physics of the problem, and is worse than IP, although it is 
still superior to the Fourier-based methods as a module of the block triangular preconditioner. 
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Wo also nolc that the spectral probe technique captures a significant part of the physics in 
this case, adapting to the co-dominant normal convection and tending to a constant iteration count 
without breaking down. It thus rescues the spectral technique and indicates how the robustness of 
the spectral preconditioner can be maintained with a compromise in efficiency. In a general purpose 
code, the elements of the D matrix could differ from unity but be bounded artificially, allowing 
partial tangential adaptivity with full normal adaptivity. 
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Table 10: Same as Table 7, except for jet convection. 


As in the spectral equivalence tests in Fable 5, the jet case recorded in Table 10 tends to 
diminish the extremes of preconditioner behavior relative to the uniform skew convection case. 
The IP and SP results worsen while the D and T methods nearly hold their own relative to Table 9. 
The pure spectral method survives at a higher overall Reynolds number because the tangential 
velocities at the interface are lower than the maximum core velocity of the jet, to which Re is 
scaled. 



Structurally Symmetric 

Block Triangular 

Re 

IP 

S 

SP 

I) 

T 

ip 

S 

SP 

D 

T 

0 

11 

l 

1 

4 

11 

12 

2 

2 

5 

4 

4 

11 

3 

3 

4 

1 1 

12 

4 

4 

5 

5 

16 

11 

5 

5 

5 

11 

12 

6 

6 

6 

6 

64 

13 

8 

7 

8 

13 

14 

9 

8 

9 

8 

256 

18 

14 

11 

14 

16 

19 

15 

12 

14 

15 

1024 

25 

23 

16 

23 

26 

26 

23 

17 

23 

24 


Table 11: Same as Table 7, except for cell convection. 


Again, the cell case of Table 11 is an equalizer for most of the methods; however, the perfor- 
mance of the spectral probe technique is singularly good. This is easily explained as follows: the 
average tangential velocity along the interface is zero because of the symmetry of Figure 2(f), so 
D = I for both S and SP. However, S also employs an zero average for the normal velocity, whereas 
SP adapts to strong inflow and outflow along opposite halves of the interface. The performance 
of IP, which improves with Re in all previous tables, deteriorates in this table because no pair 
of coupling matrices is weak (see comments on Table 2) under recirculation. Performances under 
the structurally symmetric and block tridiagonal schemes of the tangential preconditioner become 
similar at high Re. The recirculating cell flow is in some sense a worst case for a single interface. If 
the domain of Figure 2(f) is further decomposed by a vertical interface, putting a vertex in the cen- 
ter, all four interfaces would be free of two-signed velocity components, and easier to precondition. 
(The cross-point preconditioning then becomes an important subject.) 
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Figure 3: Contour plots of the solution to V • eVu = / 
with homogeneous Dirichlet boundary conditions and / = 1 
for various ratios of the (piecewise constant) diffusivity in 
the adjoining subdomains. “Ratio” is (lower / ( upper an( i the 
product Ciowertupper is maintained at unity. 

4.3. Piecewise Constant Diffusivity 

This subsection focuses on the effectiveness of the preconditioners in heterogeneous problems. 
With reference to Figure 3, we assign a higher diffusivity in the lower domain than in the upper. 
We report convergence results in two limits: fixed diffusivity ratio and increasing h~ l (Table 12) 
and fixed h _1 and increasing diffusivity ratio (Table 13). 

The performance of the two probe and the tangential preconditioners in the terraced diffusivity 
case in Table 12 is identical to their performance in the uniform case of Table 1. However, the 
Dryja preconditioner is slightly worse and the spectral preconditioner (based on the arithmetically 
averaged diffusivity) is much worse than in the uniform case. In fact, S is the worst preconditioner 
in the terraced diffusion case while SP is the best, the probe supplying crucial information. 

As the ratio of diffusivitics increases for a fixed problem size, as in Table 13, all methods 
asymptote monotonically to fixed convergence rales, as shown theoretically in [13]. The physical 
interpretation of the asymptotically large diffusivity ratio is that virtually all of the variation 
of solution in response to the (fixed) forcing occurs in the subdomain with the lesser diffusivity. 
Figure 3 illustrates this phenomenon. The contours are kinked at the interface since it is the normal 
component of the diffusive flux, cVu, not Vtt itself, which must balance on either side. The solution 
along the interface is asymptotically the boundary value of zero. The spectral probe method is a 
singularly good performer in this limit. 

4.4. Sensitivity to Aspect Ratio . 

Table H examines the constant diffusion case under aspect ratios ranging from 1:^ (a squat 
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Table 12: Iteration counts for the step diffusion problem as 
a function of inesh parameter for two different preconditioner 
structures and five different interface blocks. The ratio of the 
diffusion coefficients of the two subdomains is 64. 
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Table 13: Iteration counts for the step diffusion problem at 
a constant mesh parameter of 1/32 as a function of diffusiv- 
ity ratio for two different preconditioner structures and five 
different interface blocks. 


rectangle with subdomains just two cells high) to 1:2 (a tall rectangle composed of two square 
subdomains). Note that the discrete length of the interface, n, remains constant at 63 in all 
examples, while heights mi and m 2 vary from 1 to 63. 
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Table 14: Iteration counts for the pure diffusion problem 
at a constant mesh parameter of 1/64 as a function of as- 
pect ratio for two different preconditioner structures and five 
different interface blocks. 

We confirm that S and SP arc completely adaptive to aspect ratio in this constant coefficient 
problem, which is not true of any other method, including Dryja’s. The tangential preconditioner 
is understandably good when the narrow (more strongly boundary influenced) direction is the 
tangential one, and poorer when this aspect ratio is reversed. T.he nonmonotonic character of 
IP is interesting, showing that it adapts well to either extreme and is poorest in the balanced 
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intermediate case. 

5. Conclusions 

We conclude by pulling together some overall assessments of the preconditioners tested in the 
previous section. The interface probe technique is the most general purpose and robust of the meth- 
ods. It is always definable and adapts well to high Re and extreme aspect ratio, but is nonoptimal 
at high h~ ] and is occasionally the worst method. It does very well in predominantly unidirectional 
strongly convective flows. Generalizations to multiple interfaces, multiple components, and inexact 
subdomain solves are straightforward, but not pursued herein (see [26]). 

The spectral method is mathematically the method of choice asymptotically in ft” 1 , where 
physically well resolved problems should end up. However, since it is based on a global constant- 
coefficient approximation it does not perform well in heterogeneous problems. Furthermore, high 
cell Reynolds numbers can cause it to go unstable, and flow fields whose actual values along the 
interface differ greatly from their average values are poorly represented. A stabilizing technique 
was proposed which could preserve the robustness of the spectral method at high cell Reynolds 
numbers, namely selection of an exponent base for D in (3.3) closer to unity than the true |a/c|. 

The spectral probe method is as good as the spectral method when W alone is a good ap- 
proximation to the eigenvectors of C (i.e., low tangential convection). A JD-modulated version of 
spectral probe can be just as effective (and just as vulnerable to strong tangential convection) as 
the pure spectral method in a constant coefficient problem. SP adapts better than S to normal 
convection variation and it adapts perfectly to piecewise constant heterogeneities in the diffusivity. 
As a “probe” method, it shares the coding disadvantages of IP. 

The Dryja preconditioner is never exclusively the best method, but is as good as either S or SP 
in a variable coefficient problem in the limit h —> 0, where the diffusive contributions to A dominate. 
However, the extra cost of S is insignificant compared to D, and SP costs only extra subdomain 
solves in the pre-processing, so these techniques (suitably stabilized for tangential convection) will 
almost always be preferable in applications. 

All of the above techniques are relatively insensitive to the choice of the overall preconditioner 
structure. The tangential interface preconditioner is an exception, for reasons yet to be explained 
theoretically. It is much better under the block triangular form of the overall preconditioner, and 
is very competitive with the other techniques under physically predictable circumstances, namely 
tangentially dominant convection or narrow aspect ratio. It is also simple to code and has been 
successfully generalized to multicomponent systems (see [22]). 

We note that when exact subdomain solves are performed, the block triangular form of the 
preconditioner, i?2, with its one-directional flow of information from the separators to the interiors, 
is almost always preferable to the structurally symmetric form B\ in terms of execution time, 
since a full set of subdomain solves is saved at each iteration and the iteration counts are usually 
comparable. The structurally symmetric form is also obviously useful when A is itself symmetric, 
since it then admits preconditioned conjugate gradients rather than GMRES as the iterative solver. 

Clearly, a user who understands his problem physically and is willing to customize can do well 
by choosing among a variety of interface preconditioner modules, perhaps using different ones on 
differently resolved grids within the same overall solution procedure. Hopes of finding a single “best” 
method from among those considered must clearly be dismissed, but the field is still young. It is clear 
that general purpose adaptivity will require some form of “probing” via subdomain solves; taken to 
the limit, one obtains C directly. Such probing has an associated cost comparable to an iteration 
step, and must be undertaken conservatively. Future developments which iteratively improve the 
interface preconditioning based on accumulated subdomain solve data would be welcome, and so 
would more hybrids along the lines of the spectral probe which incorporate both analytical and 
numerical data. 
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